-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-44895: Temporarily add an extra gc.collect() call #27746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…vestigate the refleak
Let's run this on buildbots and if it indeed resolves the leak we can do this for now instead of skipping the test. |
🤖 New build scheduled with the buildbot fleet by @iritkatriel for commit a6a8b1c 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
@vstinner The buildbot tests passed. This might quiet down the ci. What do you think? |
Thanks @iritkatriel for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
GH-27753 is a backport of this pull request to the 3.10 branch. |
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue. (cherry picked from commit 7bf28cb) Co-authored-by: Irit Katriel <[email protected]>
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue. (cherry picked from commit 7bf28cb) Co-authored-by: Irit Katriel <[email protected]>
@@ -1014,6 +1014,9 @@ def cycle(): | |||
|
|||
def test_no_hang_on_context_chain_cycle2(self): | |||
# See issue 25782. Cycle at head of context chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix looks incorrect, tests should not depend on the GC to pass. When this happens, is a symptom of another problem.
I propose to revert this commit and investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @vstinner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand is a "temporary measure" but in my experience those are left there with no fixes more often than not. Also, I don't feel comfortable with "
temporary fixes in the release candidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to disable the test. That doesn't fix the issue either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to deactivate the test. The reason is that relying on the GC in this way at the end has global effects and can mask other issues. Is also not deterministic and can actually be an endless loop in some extreme situations involving resurrection.
This is just my opinion on this of course, If the consensus is to leave this because the test has more value, then let's leave it, but I have to say that my previous experience with these kind of fixes is that they are left there more often than not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry, is not urgent.
Thanks a lot for the investigation and for all the work!! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable with " temporary fixes in the release candidate.
Sure, we weren't going to let this slip into RC2. The point was to make refleak tests able to catch other regressions on that branch in the mean time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd rather redo the fix as a skip instead of gc.collect()
then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.
How about we leave it as is for the weekend and remove the gc.collect()
loop on Monday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd rather redo the fix as a skip instead of
gc.collect()
then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.
I don't get what you mean by this. Why do we want to know if the approach to work around is effective? What information do we gain by this? I can understand the though that this may gives us some more light into the problem but this workaround is too intrusive to gather any conclusions from the actual problem, more then that a cycle is likely involved.
How about we leave it as is for the weekend and remove the
gc.collect()
loop on Monday?
👍 Works for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to know if the approach to work around is effective?
Irit wrote:
Let's run this on buildbots and if it indeed resolves the leak we can do this for now.
I misinterpreted this as "let's merge this and see" but obviously she meant the test-with-buildbots
label. Nevermind!
I'm fine with the workaround to unblock buildbots, but https://bugs.python.org/issue44895 must only be closed when the root issue is identified. regrtest test runner runs gc.collect(). regrtest -R 3:3 runs gc.collect() one more time. So it's strange that you have to add a third gc.collect() call. The worst case that I saw was a bug in a type implemented in C: https://vstinner.github.io/subinterpreter-leaks.html Calling gc.collect() worked around this bug. But I had to fix the C type (_thread.Lock) to fix the root issue. I don't think that it's the same bug here, since the leak was only seen when an interpreter was destroyed. Here the leak is seen at each loop. |
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue.
https://bugs.python.org/issue44895